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ON CERTAIN DISTRIBUTIONS DERIVED FROM THE MULTINOMIAL 
DISTRIBUTION’ 


By SoLtomon KULLBACK 


1. Introduction. With the multinomial distribution as a background, there 
may be derived a number of distributions which are of interest in certain prac- 
tical applications. Several of these distributions are here presented and the 
theory is illustrated by specific examples. 


2. Preliminary data. In the discussion of the distributions to be considered 
there are needed certain factorial sums whose values are now to be derived. 


In the following discussion only positive integral values (including zero) are 
to be considered. 


There is desired the value, in terms of N, n, r, of 


N! 

a a si TY canes tomenien 
(2 1) Fn, ) ) -++ 2! 
where the summation is for all values of 21, 42, «++ , Z, such that 2; + 22 + --- 
+z, = N and no z is equal tor. 
_ Let us first consider the case for r = 0; 1.e., we desire a value for the sum in 

(2.1) for all values of 21, 22, --- , x, such that 7; + r2 + --- + 2, = N and 
no z is equal to zero. By the multinomial theorem, we have that? 

y N! z z z 
(2.2) (a, + a2 + +++ + a,)" = DO z,laal--> x! @\' a3" ++: a, 
where the summation is for all values of 2, t2, «++ , 2, such that x; + 72+ --: 
+2r,=N. Ifa, = a= --- =a, = 1, then 
N! 

(2.3) a" = 2. a es M+e+ee +a,=N. 


x1! Xe! a 


The sum in (2.3) may however be rearranged into the sum of a number of 
terms as follows: 


A! 
} Prevsens Mt+m+e++m=N, noz = 0; 


X1! 2! 2° Tie 


Al 
_ at Mtme+t+es  +ai=N, no z = O; 





1 Presented to the Institute of Mathematical Statistics January 2, 1936. 
?H.S. Hall & S. R. Knight, Higher Algebra, MacMillan & Co., 4th Ed. (1924), Chap. 15. 
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Thus we may rewrite (2.3) as 


n” = fo(n, N) + nfo(n — 1, N) 
(2.5) 
+ a o(n — 2, N) + °° + (") fun a r, N) + 
Replacing n by n — 1 in (2.5) there is obtained 


(n — 1)* = fo(n — 1, N) 
(2.6) 


+ (n — 1) foln — 2,N) + +--+ Ys: ) sin —r—1,N)+-::- 


Multiplying (2.6) by n and subtracting the result from (2.5), there is obtained 
n™ — n(n — 1)* = fo(n, N) 
(2.7) n(n — 1) 
* ay Soln — 2, N) — : ~e(.* flan — 1, N) - 
Replacing n by n — 2 in (2.5) there is obtained 


(n — 2)” = fo(n — 2,N) 
(2.8) 


y= 


+ (n — 2)foln — 3,N)+4+--- +( 1) fn —r—1,N)+-> 


Multiplying (2.8) by n(n — 1)/2 and adding the result to (2.8), there is obtained 


n(n = Ee _ 2)" = filn,N) + n(n _— nih 


fon —3,N)4+-::- +, : 1 sal «f= 1, W) + - 


Continuing this process, there is finally obtained the result that 


= a ha 


oe — ela ~ 17" + - 
(2.9) 


(2.10) fo(n,N) = n* — n(n — 1)* 4 - (n — 2)% — --- en-1" 

It may be shown’ that the right side of (2.10) is A"x’ for z = 0. The author 
has elsewhe re obtained (2.10), but by a special procedure not applicable to the 
general case. 

We may readily verify (2. 10) for example, forn = 3, N = 5. Ifa +m 
+23 = 5and noz = 0, then the sets of solutions = (3,1,1), (1,3,1), (1,1,3), 

5! 
(2,2,1), (2,1,2), (1,2,2), and fo(3,5) = te tt 150. From (2.10) 
there is obtained f,(3,5) = 3° — 3.2” + 3.2/2 = 150. 


3E. T. Whittaker & G. Robinson, The Calculus of Observations, Blackie & Son Ltd. 
(1924), p. hs 
4§. Kullback, ‘‘On the Bernoulli Distribution,’’ Bull. Am. Math. Soc., December, 1935. 





DISTRIBUTIONS DERIVED FROM MULTINOMIAL DISTRIBUTION 129 


For the general case, we return again to (2.3) and rearrange the right side 
into the sum of a number of terms as follows: 


! 
es mMt+m+s:+ +%,=N 


21! x! i In! 


M+ %+e:'+%i1=N—7, 


M+ t+ +++ +4r2=N 


ijn » 
! 41! 2o!+++ Laz 


Thus we may rewrite (2.3) as 


* = fn, ne =. fin —1,N -*) 


(2.12) “- 
n(n — 1)N ” 
— 2N(r!)? 

where NV“ = N(N — 1)(N — 2) --- (N—k +1). 
Replacing n by n — 1 and N by N — r in (2.12) there is obtained 
(n —1)*" =f,(n —1,N — 1) 


2.13 . 
(2.13) ease mn St Re x 


4 — fin — 2,N — 2r) + - 


(r) 


Multiplying (2.13) by ~ and subtracting the result from (2.12), there is 
obtained 


n(n — 1)N°” 
Nr 2 


(r) 
(2.14) n¥ — = (n —1)*" =f,(n,N) — 


fAn — 2,N — 2r)—-- 


By continuing this process, in a manner similar to that used for the case r = 0) 
there is finally obtained 
=. n(n — 1)N°” 


fn, N) = n* — (n — 1)"” ear (n — 2)? 


n Nn®” ; 
-())g—9 
cr!) 
By setting r 0 in (2.15), there is of course obtained the value already 
found in (2.10). 


We may readily verify (2.15) for example, for n = 3, N = 5,r = 2. If 
%1 + 22 + x3 = 5 and no x = 2, then the sets of solutions are (5,0,0), (0,5,0), 


(2.15) 
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(0,0,5), (4,1,0), (1,4,0), (1,0,4), (4,0,1), (0,1,4), (0,4,1), (3,1,1), (1,3,1), (1, 1,8), 
and f2(3,5) = 3- 51/5! + 6-5!/4! + 3-5!/3! = 93. From (2.15) there is ob- 
tained f2o(3,5) = 3° — 3-5-4-2°/2! + 3-2-5-4-3-2/21(2!)? = 93. 

The same method of procedure may be applied to evaluate 


N! 


cla 
21! Xo! - nl 


m+ a+ +++ +%,=N 


(2.16) fre...t(n, N) ao — 


noz = 7,8,°-:, ort. 


Thus, there is derived the result that 
(r) aa N-r (s) aa N-s 
frs(n, N) = n™ _ n(¥ (n rl V) — + N a 1) ) 


N°? (n — 2)" | N°*?(n — 2)" 
+ nee — (Sage — + a 


N™(n ~ 2)’ —) am "(n — 3)" 
+ ON (sh? J a Ey 3!(r)s 

NOmM(n — ania + NOG 3)" J—r—2s N™(n = lia 

2! (r!)? (s!) 2! (r!) (s!)? 3! (s!)3 
We may readily verify (2.17) for example, for n = 3, N = 5,r = 0,8 = 2. 
If 2, + 2 + x3 = 5 and no x = 0 or 2, then the sets of solutions are (3,1,1), 
(1,3,1), (1,1,3) and fo(3,5) = 3-5!/3! = 60. From (2.17) there is obtained 
foo(3,5) = 3° — 3(2° + 5-4-2°/2) + 3-2(1/2! + 5-4/2! + 5-4-3- -2/(2!)°) = 60. 


It will be shown later (see section 8) that 


r(s) 
fn, N) = frs(n, nae. fi(n —1,N —s) 








(2.18) ( 1)N® 
nn — 
ft ane fon = 2,N — 26) + os 


fn, N) = f.s(n, N) + _ ihe —1,N-r) 


(2.19) - 
n(n — 1I)N™ ; 
2h (rl) 
From (2.18) and (2.19) there may be derived, by a method similar to that 
employed in deriving (2.15), that 


fis(n, N) = fn, N) — nN” pln _ a N — 8) 


fis(n oa 2, N Poe 2r) + ° 


(2.20) ' ye 
n(n — 1)N 
= — QN(s!)? =e fAn — 2, N — 2s) — 


This latter result also follows from (2.17 and (2.15). 
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Let us now consider the following generalization of (2.1). There is desired 
in terms of N, n, r, a1, d2, «++ , Gn, the value of 





N! 
21! xe! “ve Lal 


71 72 


Qi; Ae +++ ain 


(2.21) F(n, N, Q1,Q2,°°°, Qn) = 





where a1, G2, -**, Gn, are constants and the summation is for all values of 
1,22, °** , 2%, Such that a, + 22+ ---+2,= Nandnozx=r. The method 
of procedure is the same as that for the case already considered, viz when 


a= = --- =a, = 1. 
The sum in (2.2) may be rearranged into the sum of a number of terms as 
follows: 










Laas eCO—i«=' G??-== GH, tet + + 2n=N, noze=r; 


SCCETCSSCHSHSECSHAHSOHSSEHEKSTCS HPSS HOSCSSECHACSCCORSCCCH OSHS HA SEO DSHKC OORT HTH HO BOSS 





r ? 
+ On—k41°** An Z N! Me oe gre 


(r!)* a! +++ Cnr! 


-+2n4=N—kr, ete, nozre=r; 


CECH SCE DESDE SHER SSEHSESECRHCHSRSTCHCTOH HOO DHKOEHHE HHH OHH HEH HSH OH ETE 





For convenience, let us write 
A(n, N) = (a1 + a2 + +++ + a,)¥ 
Ai(n — 1, N) = (Qi + ++ H+ Gina H+ Ginn + + H+ 4,)% 

Aij(n —2,N) =(Qit-->-+aiitainit:::taji1tajit-:-++a,)% 












eceoneeeeeoer eee eee eee ee ee eee eeee eee eeereseeeeeeeeeeseeeeeeeeeseses 


(2.23) G,(n, N) _— F,(n, N, Gi, Ge, °°° » An) 
G.(n — 1a, ai) = F,(n — 1, N, a1, 42, a » Gi-1, Gi+1, ile dn) 
2,N,a;, a;)=F,(n—2,N,qm,---, 


so that (2.2) may be written as 


(r) n 
Ate, o Cla, N) +97 Dai 6 nt. one 








(2.24) mie 
N “- 


T arene 4 2X HiaiG(n — 2,N — 2r, as, a;)+- (i ¥ j, ete.) 
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From (2.24), there are obtained n equations 


(r) 
A x(n i 1,N car r) = G,(n a 1N-— T> a;) + - W — at 


(2.25) 


n 


> a5 Gn - 2, N — 2r, a; , @;) + (i = 1, 2, ore »N, Jj ¥ 1) 
7=1 


Multiplying (2.25) by a;N‘”/r! and subtracting the result from (2.24), there 
is obtained 


n a, N” 
A(n,N) — ay (n—1,N —r) = G,(n, N) 


(2:26) . 
oo > a a;G(n — 2,N — 2r,a;,a;) — - 
ath): rl)? « 1 


I= 
Continuing this procedure, there is finally obtained 


7(r) 
G,(n, N) = F,(n, N, ai, @2, ali » By) = = A(n, N) - ee 


- n 7 (2r) 
(2.27) > aA, i an a sect >> aja; A;(n — 2,N — 2r) —-::- 


( ¥ j, ete.) 


Similar results are obtainable for 


‘ rT N! z) 22 Za 
(2.28) Gye... = Frye...(n, N, a1, 02, °**,@n) = D, — ee +S 
x1! 2! xn! 
where the summation is for all values of 2; such that 7; + 72 + ---+2,=N, 
and nox = 7,8, ::: ,oré. 
Thus, it will be shown later (see section 8), that 
nN‘? nm 
G,(n, N) = G,.(n, N) + a _ a;G,(n — 1, N — s, a;) 
S$: i=l 
(2.29) 


x 2s 


>> a;a;G,(n — 2,N — 2s,a;,ajP+ °°: (i ¥ J, ete.) 


Corresponding to the deriv ation of (2.27), there is obtained from (2.29) 
the fact that 
r(s) nm 


G,,(n, N) = G,(n, N) we a a:G,.(n — 1, N — s, a) 


(2.30) yy nm 
+ ar cpp Qe, Hai Gn — 2,N — 28,a:,a;)—--- Gj, ete,) 


3. The. problem to be studied. Consider a trial in which one of n mutually 
exclusive events may occur, with the respective probabilities of ,occurrence 
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Pi, P2,) °** , Pn Where pi + po + --+ + pn = 1. The probabilities of the 
various combinations of events which are possible in N trials are given by the 
terms of the expansion of (pi: + po + +++ + pn). 

In the N trials some of the possible events may not occur, others may occur 
one, twice, etc. It is desired to study the distribution of the number of events 
which do not occur; the distribution of the number of events which occur once 
each, etc. The simultaneous distributions of the events above described are 
also to be studied. 

For example, the possible event may be the occurrence of a digit. A study 
of a sequence of random digits, in sets of ten, yielded the following three 
sample sets. ' 

5| 6/7 | 8 
| 


| 


| 


9 
iletels 
0 1 

f 


| | 





1/1/21] 1 





2 | 
2/1/2]11| 0] 


Fie. 1 


In the first set three events do not occur, four occur once each, and three oceur 
twice each. In the second set one event does not occur, eight events occur once 
each, and one event occurs twice; ete. 


4. Distribution of the number of events not occurring. To obtain the distri- 
bution of the number of events which do not occur, there is applied to the 
expansion of (p; + po + --- + pn»)* a procedure similar to that employed 
in section 2. 

Thus, if 7, represents the probability for r events not occurring, then 


N! 21, ze z 
=>) = OP i? Pe, Ut Mt + mH N 
alael---a,t?' ? oie een 

no « = 0; 
N! ‘i i 
“+ ——3 i *** Set 
he 5 
* Za! 


x! *s 


Zr+1 


z 
—EEE P 4 ee Pn 
“+ Sal 
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Employing (2.21), we may write (4.1) as 


= Fy(n, N, pi, pe, ssa » Dn) 
= Fp(n — 1, N, po, +++, Pn) + pew + Fo(n — 1,N, 21,2, °**, Daa) 


— Fyo(n — tN, Pres oo oe Pn) + See + Fo(n — r, N, pi, eee » Par) 
Since p: + po + --- + pn = 1 there is found from (2.27) that 


n , 1 n 7 
ru 1— du (1 — pi)” + oy al (1 — ps — p,)” 
= - t= 


> (-—p—-pi- po y+: 


es 


y= po" “Rena 


n 


+5 (1 — p; — pj — pe)” — -s 


i,7, k=1 


n 


= pe Pi) (1—pi— pj— pr) +° -} 


te 
ca (1 =a Di)” = 


(i ¥ j, ete.) 


The factorial moments’ of the distribution given by (4.3) are easily derived. 
The first factorial moment is given by o; = m0 + 2729 + 3230 +--+ + roo +-:: 
and the summation of the proper terms in (4.3) yields 


(4.4) 1 = >» (i — pi)” 
i=1 


In general, the r-th factorial moment, given by o, = )) k(k — 1) 
k=r 
(k —r+1)mois 


(4.5) 6; = 7 (1 — Pa— Po—**' — Dr), (a 5 b, etc.). 
1 


a,b,:°°,r= 
Indeed, (4.3) illustrates the fact that, if f(x) is the probability that a discon- 
tinuous variate takes the value z, then’® 


(4.6) f(z) = Lo» (—1)' o244/k! 


5 J. F. Steffensen, Interpolation (1927), p. 101. 


6 J. F. Steffensen, ‘‘Factorial Moments and Discontinuous Frequency Functions” 
Skandinavisk Aktuarietidskrift, Vol. VI (1923), pp. 73-89. 
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The moments about any constant of the distribution given by (4.3) may be 

derived from the factorial moments by the relation’ 

(4.7) E(x — a)’ = (1 + oA + o2A’/2! 4+ --- +,A'/r!)-#  (€ = —a) 


where A is the difference operator of the calculus of finite differences, and & 
is replaced by (—a) after the indicated operations have been performed. 


Of special interest is the case when pi = po = +--+ = Da = =, for which (4.3) 


= = (2) sem N) —_ (=) A"0" 
n n 
1 ” 1 r n—-1 AN 
(?) nf(n —1,N) = () nda” 0 
n n 


becomes 


where fo(n, N) and A"0” are as defined in section 2. The probabilities in (4.8) 
N 
are the respective terms of the expansion of (2) (1 + A)*-0*. 


For this case the r-th factorial moment becomes 
(4.9) or = n(n — 1)-:- (n —r+1) (n — r)*/n™ 


There is presented an example of the distribution (4.8) for the case n = N = 10. 
It is found that® 


= 1 A°O” = 16435440 
A’0”’ = 1022 A‘0"° = 29635200 
A’0” = 55980 A‘o” = 30240000 
A‘o” = 818520 A’o” = 16329600 
A’0” = 5103000 Ao” = 3628800 


.000362880 750 .128595600 
016329600 760 .017188920 
.136080000 ™70 .00067 1760 
395622400 780 .000004599 
= .345144240 ™ = .000000001 


= 3.486784401 m = 3.486784401 
9.663676416 o = 0.992795358 


7 This result is derived as follows: (4 — a)" = (1 + A)#-(—a)"; E(x — a)" = Zz (x — a)" 
z=l1 


i(z) = (> (1 +a)es0)-(-ay - (> (1 + 2A + x(x — 1)A?/2! + -yta)) (ay For 


a bivariate distribution it may be shown similarly that, symbolically, E((z — a)*"(y — b)*) 
= {exp(o1. A; + o.; Ac)}-(—a)"(—b)* where o1.%¢.:" = omn and A; operates only on a and A: 
operates only on b. A similar result may be derived for a multivariate distribution. 

8 ef. Whittaker & Robinson, op. cit. p. 7. 
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The observed distribution was obtained by distributing 200 sets of ten digits 
each, the digits being found in Tippet’s Random Sampling Numbers.’ The 
results obtained are given in Fig. 2. Three of the 200 observed sets were 
illustrated in section 3. 

The agreement between observed results and theoretical values is gratifying. 


5. Distribution of the number of events which occur once each. Let z,, 
represent the probability that there are k events which occur once each. Thus, 
the various probabilities, obtained by rearranging the terms of the expansion of 
(pi + po + --: + p,)”, are as follows: 


N! 
a= 7 <a *** Mes M+mwm+::-+xr=N, nox =1; 
is °° * Bes 
N! x2 In N! t1 Zn- 
sg x P2 eee Pr + cee + Pn aes > Pi os Pn-l; 

es *** Das 1: °° * Ln-1: 


In 


Pipe*** Prd, ; : > z 20 Dn bees + ney ** Dn 


N! 
-. - 


_ La~k 
Bboea PE Pee, 
wa. *** Zee 





-->ton.=N—k,ete., nozr=1; 


No. of events Observed 
not occurring| frequency 


Observed 
parameters 


Theoretical) ap | ate 
0. | = 
3.26 | | 2= 
7 | t = 3.46 
” = 1.0984 
| Theoretical 
Parameters 
= 3.49 
2 = 9.66 
= 3.49 
= 0.99 


-7~ | 
— 


3.46 
9.6 


s 


or WN 


“1 


| © Oo 





9L. H. C. Tippet, Random Sampling Numbers, Tracts for Computers, No. XV (1927), 
London. 
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In view of (2.21) and (2.27), it is found that (5.1) becomes 


. 4, N(N-l1)< so 
eS 2D pill — pi)" + ae a1 2 Pipill — pi — pi) * = ++ 
i= . t= 


“a = My pi(l = pi) — (N —1) p » pip —-2R— pi) ++: 
(5.2) i=1 ij=t 


N(N —-1)J< i 
T21 = i. Ne pipi(l — pi — pi)” ee , 


ia= 
(i ¥ j, ete.) 
From (5.2) there is readily derived the fact that 
o, = N(N —1)--- (N-—r+1) 
(5.3) n 7 
o> Pao *** pr(l —Pa—-Bhm—-*>—P) ', (ab, ete.) 


For the case in which py = pe = +--+ = Dra = -, the distribution in (5.2) 


ru = (:) filn, N) 
| 


becomes 


=? 


_— D pn —2,N 





where fi(n, N) and N“” have been defined in section 2. For 


this case (5.3) 
becomes 


(5.5) op = NON” (n — vr)" /n™ 
Evaluation of (5.4) and (5.5) forn = N = 10 yields, 


= .00811639 = .27052704 = .01632960 
04794633 1 = .15621984 00000000” 


.14082336 = .12700800 = .00036288 
= .21089376 .02177280 


= 3.87420489 m = 3.87420489 
oo = 13.58954496 o = 2.45428632 


10 For the case n = N = 10 there cannot be 9 events occurring once each, since then the 
tenth event must also occur once. 
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The observed distribution, given in Fig. 3, was obtained from the 200 sets 
previously considered. 

The agreement between the observed results and theoretical values js 
gratifying. 








6. Distribution of the number of events whlch occur r times each. Let 
kr Tepresent the probability that there are k events occurring r times each. 
Thus, the various probabilities, obtained by rearranging the terms of the ex- 
pansion of (pi: + pe + --- + p,)”, are as follows: 




















No. of events — | ; 
cceecack | frequency |“Reoreney | 9% | ses | pabeencter 
x 
0 1 | 1.62 | 0 0 a = 3.905 
1 10 | 9.58 10 0 & = 14.000 
2 30 | 28.16 | 60 60 =~ = 3.905 
3 37 |) «42.18 | 111 222 s = 2.656 
4 62 | 54.10 | 248 | 744 | Theoretical 
5 27 =|) «31.24 | «185 540 | Parameters 
6 22 25.40 | 132 | 660 o, = 3.874 
7 3 | 4.36 | 21 | 126 o2 = 13.590 
8 8 | 3.26 | 64 448 m= 3.874 
9 0 | 0.00 | 0 | 0 |o?= 2.454 
10 0 0.08 | 0 | 0 | 
98 781 



























Fia. 3 


: D;", m+a+::: 














4 N! x2 Zn . N! z Zn- 
my = Oe et + pt ee, 


r! x! +* 2 Deak 





eececeeereer eee eee eee eee were eee eee eee eeeeeeeeeeeeeeeeeeeeseeeeeeeees 








_ Pips *** De Pett... ye ao. 
me re Xu Trp! +++ Zn! ore 
Dn—b+1 °° * Da N! 2 zn- 
+ E k+1 Pp = > Pi cee Dat) 


x! <2 Ln—-z! 


“| r 1); k 





SSD ED SSC SECSS!C SESE STHESSHSHEHSSAHSSSSESOASMSSEHSHHKESCECO HOH HHEAHHEOHEOEECDHEESOSO 
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In view of (2.21) and (2.27) it is found that (6.1) becomes 
Nn” n N® n 


yr" 4 N-—2r 
7 = 1 sia ay Dy Pid — + sarap: ie a Dp; pj (1 — Di — pj) ere 


nN” (r) on 
=r” - 
(6.2) — nfs pi(l — we = ( 5 pi p31 — Pi — pi)” , + 


rl 


aa n - 
~ 91 oH 2d Pi ¢pl — eae Di) 
(i ¥ j, ete.) 
From (6.2) there is readily derived the fact that 
(kr) n 


N na r —kr 
(6.3) o. = CNG i, _PaPo °° pil — pa— po—+-:— pi)”, (a Xb, etc.) 
For r = 0,1 (6.2) and (6.3) reduce to the values previously derived. 


For the case in which pi = po = +++ = Pa = =, the distribution in (6.2) 


becomes 


=) f(n, N) 


1 N n N“ 
—s (*) (*) pe 
l 
where f,(n, N) has been defined in section 2. For this case (6.3) becomes 


(6.5) ~ = nN“ n™ (n _ ky" /n™ 


7. Simultaneous distribution of the number of events not occurring, and of 
the number of events occurring once each. The probabilities for the simul- 
taneous occurrence of the various combinations of the number of events not 
occurring, and of the number of events occurring once each, are given by rear- 
ranging the terms of the expansion of (pi: + pe + --- + pn)’, and are given 
as in Fig. 4. 

In Fig. 4 none of the subscripts take on equal values simultaneously, and Ga 
has been defined in section 2. Summation of the values in the k-th column 
of Fig. 4, yields the probability that there are (k — 1) events not occurring. 
Comparison with (4.2) yields 


Fo(n, N, pr, p2, +++» Pn) = Go(n, N) = Gu(n, N) + N > piGu(n — 1, N — 1, ps) 
t=1 
(7.1) 
ri n ; 
ron, Pi PjGo(n = 2, 7“ 2, Pi, Di) >. ’ (i e 5, etc.) 
17= 
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Number of events not occurring 


1 


n 


0 Gon, N) Go(n - i, N, Pi ) 


| i=1 


1 n> piGu(n—1, N—1, p;) 
t=1 


n 


IN a pi Go 


t,7=1 


(n — 2, N —1, pi, p)|| 


ls | - ae | 

<< IN (2 | 

> Di pj Gar oT 7 Pi Pj Gor 

! 1,j=1 | *  4,7,k=1 | 
Sm i “2, PsPid) ae = 2, Di» Pi» Pr) | 


| 





|N“) = 


let el 
|= s: a,b," **,8,a,B,°**, pl 


1 PoPo*** PsGa(n—r—s, 
N —8, Pa; ***5 Day 
Day +> 








| Number of events occurring once each 





Summation of the values in the k-th row of Fig. 4, yields the probability 
that there are (k — 1) events occurring once each. Comparison with (5.2) 
and (2.27) yields 


F,(n, N, pi, pe, *** , Pn) = Gil(n, N) = Galn, N) + >> Galn — 1,N, pi) 
i=l 
(7.2) 
-/ Go(n — 2,N,pi,p)) + ---, (ji, ete.) 
1,7=1 


If we use x to represent the number of events not occurring, and y the number 
of events occurring once each, then it is found that 


n 


E(x y) ~~ N” Zz Papo ies pl — Pa—**' —Ds 


a,b,°+-,8,a,B8,°°*,p= 


—Pa-***—p,)', (ab, ete.). 


If of represents the average number of events not occurring, when there 
are k events occurring once each, then from Fig. 4 there is found that 


Y Guin —1,N,p:)+2 Gou(n — 2, N, pi, p;)/2! 


1,j7=1 


n 


2, Galn — 8, N, Bis Bi Pr)/3! + 
(7.4) 0X01 = — A ve aan 


Goln, N) + : Goiln roe 1, N, Di) 





= Goi(n = Bi N, Di, p;)/2! > 


a. | 
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In view of (7.2), (7.4) reduces to 


(7.5) oto = (> Gi(n, N, p)) / Gln, N) 


A similar procedure, yields, in general 
b 24 ,_, PoP “++ peGi(n — k —1,N —k, pa, Po, *** » Pky Pr) 
(7.6) ofa = scortannea tne - 


2, Pees +++ p.Giln — k,N —k, pa, po, *** 5 Pe) 


(a ¥ b, etc.) 


If :#x0 represents the average number of events occurring once each, when 
there are k events not occurring, then from Fig. 4, there is found that 


my piGu(n -1,N- 1, p:) = 2(N — 1) 


i=1 
} pipiGuln — 2,N — 2, p;, p;)/2!+ -- +} 
(7.7) Yo = nt ~ 
Go(n, N) + N 7 piGu(n — 1,N — 1, pi) 
i=1 





(i ¥ j, etc.) 


4 N® = Di p;Gu(n — 2,N — 2, pi, p;)/2! 


— 


In view of (7.1), (7.7) reduces to 


(7.8) 1Yoo = (ve piGo(n —1,N —1, pd) / Gon, N) 
i=1 


A similar procedure, yields, in general 


ws. 
79) doe = _ —— (a+b, etc.) 
. > Go(n am k, N, Pa, Po,***, Pr) 


_, PaGoln — k — 1, N — 1, Pa, Po, ***y Pe, Pr) 





For the case in which p; = po = +++ = Pa = =, as may be found from Fig. 4, 


the probability for the simultaneous occurrence of r events not occurring, and 
s events occurring once each, is given by 


N . (r+s) (s) 
(7.10) (7) ii. fu(n —r—s,N —s) 


n} rts! 
For this case (7.1), (7.2), (7.3), (7.6), and (7.9) yield respectively 
(7.11) fo(n, N) = fu(n, N) + nNfa(n — 1,N — 1) + (Sw fuin — 2, 
N—2)+-:-:: 
(7.12) filn, N) = fo(n, N) + nfo(n sr Be N) + @izc _— 2, N) = eee 


(7.13) or = Nn**?(n — r — 8)**/n™ 
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(7.14) ofa = (n —kfi(n —k —1,N —b/fi(n — k, N —b) 
(7.15) iw = N(n — Wfo(n — k — 1, N — 1)/foln — k, N) 


Let us consider again the case when p; = po = +--+ = Pn = ~ and n= WN = 10. 

Evaluating (7.14) and (7.15) by means of (2.15) yields 
0X1 = 5.71 0X51 = 3.02 
oti = 5.21 0X61 = 2.10 

(7.16) 0X21 = 4.51 0X71 ad 2.00 
| oFs1 = 4.10 0X1 = 1.00 
(ofa => 3.28 0X91 = 0.00 
1Yoo = 10.00 1950 = 1.83 
1910 = 8.00 1960 = 0.89 

(7.17) 1920 = 6.16 1970 = 0.27 
1930 = 4.50 1980 = 0.02 
1Yoo = 3.05 1990 = 0.00 


The 200 sets of observations already considered yielded the simultaneous 
distribution given in Fig. 5. 
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Fig. 5 
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The distribution in Fig. 5 yields é, = 11.89, (7.13) yields o,, = 12.07959552. 
The agreement between the observed results in Fig. 5 and the theoretical 
values in (7.16) and (7.17) is gratifying. 


8. Simultaneous distribution of the number of events which occur r times 
each, and of the number of events which occur s times each. The probabilities 
for the simultaneous occurrence of the various combinations of the number of 
events which occur r times each, and of the number of events which occur s 
times each, are obtained by rearranging the terms of the expansion of (pi + pe 
+--+ + pn)”. If mer,1 is the probability for the simultaneous occurrence of 
k events which occur r times each and | events which occur s times each, then 


N&rtls n 
1) 7 A GDED anenadthewsnen Be PEE BAG 


(n—k—1,N —kr—ls, pa,-+++, Pi, Pay***, Prd, (ab, etc.) 


where G,; is defined in section 2. 
From (8.1) and (6.2), there is derived, in a manner similar to the derivation 
of (7.1) and (7.2), the result that 


N® 
s! 





Fn, N,pi,°*° » Pn) = G,(n, N) = G,,(n, N) + 


a Di G,.(n 2%, 1, a= 8, pi) 
t=1 
(8.2) 


(2s) n 


+ a7293 i pi Pj; Grn — 2,N — 2s, pi, pi) + ae (i ¥ j, etc.) 
2! (s!)? i,j=1 


and a similar result by interchanging r and s in (8.2). 
For the distribution given by (8.1), it is found that 


N&rt) n 


oH DEG ate ee 


+, kya, B+, A=1 


(8.3) 


N—kr—ls 


(1 — pe—-** — Pe — Pa— *** — Pr) , (ab, ete.) 
If ,£;, represents the average number of events which occur r times each 


when there are 1 events which occur s times each, then from (8.1) and (8.2), 
in a manner similar to the derivation of (7.6), it is found that | 


(N= 1) pipe-++PGs(n— 1 —1,N — 1 = Is, pe, Pay *** Pa) 





rXls = 


(8.4) r! >> opi--- piG.(n —1, N — Is, pa, *** 5 Prd 
a,***A=1 


(a 8B, etc.) 
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If .#x represents the average number of events which occur s times each 
when there are k events which occur r times each, then by interchanging k and], 
and r and s in (8.4), there is found 





(N — kr) ae - Da —s Pi Pa Gr(n —k— 1,N ~ir = 8,Pa,*** » Pky Da) 





















sYkr —= 7 
(8.5) ie (Pars PeGs(n — k, N — kr, pa, +++ 5 Pe) 
(a ¥ b, etc.) 
For the case when pi = po = +++ = Dn = =, it is found that (8.1), (8.2), 
(8.3), (8.4), and (8.5) respectively yield 
LV ptt) yrs) 
(8.6) TWkr,ls = (2) ati Sre(n — k en l, N —_ kr —— ls) 
f.(n, N) = f,.(n, mss oT 
(8.7) ( 
eae 1 N 2s) 
+ OY fal — 2, N = 25) + - 
(8.8) _— grr ec, io 1)" (rt) (s!)' a 


(8.9) ,f. = (n —I(N — 1s) f.(n — 1 —1,N —r —1s)/r!f,.(n — 1,N — 1s) 
(8.10) Her = (n —k)(N — kr) f,(n —k —1,N — kr —8)/s!f,(n — k, N — kr) 


For r = 0, s = 1, the results derived in this section of course reduce to those 
already derived in section 7. 


9. Conclusion. It is clear that the same method of procedure may be em- 
ployed to study the simultaneous distribution of the number of events which 
occur r, s, --: , t, times each. However we will not continue the discussion 
any further. 

We have thus seen that the multinomial distribution serves as the back- 
ground for the study of a number of distributions which have certain practical 
applications. 

The theory discussed herein has been illustrated by several examples which 
yielded gratifying agreement between observed and theoretical results. 





WASHINGTON, D. C. 















A PROBLEM IN LEAST SQUARES 


By Jan K. WISNIEWSKI 


§1. We are dealing with two variables, the observed values of which are 
denoted x and y respectively. The pairs of observations are divided into r 
groups, numbering 7), m2, --- , pairs. Suppose in each group we determine a 


regression equation of the following shape: 


ys = a; + ba + +++ ma" 


(1) 


where y; denotes the value of the “dependent” variable obtained from the 
regression equation, while y without any subscript denotes its observed value. 
The r regression equations of type (1) are not assumed independent; on the 


contrary, we postulate that 
Do yi = do + bot + “+5 Mex" 


be fulfilled identically in x; ao, bo, --- mo being predetermined numbers. 
leads to the following conditions: 


2 a; = 22d: = bo «++ Dim; = mo. 
1 1 
The magnitude to be minimized under the theory of least squares is now 


¢=3 


Z= DE Dily — (as + be + sma + Lefy —[(w- La 
+ (60 - Lbi)2+ +++ (my — 


The normal equations derived from (4) are of the following shape: 


nits + me Ds + by Die + (Lt) (Fe) + --- my Ee 
+ (Xm) (Xez) a Daw Bist emo 


1 


(2) 


This 


(3) 
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e—5 


a; 32+ (= a) (D2) +0; D2 + (= bs) (2, 2”) 


1 


















r—l 


~ 7° My Di i +> (= mi) iz» en) = i ty — ke ry + Ao z x 


1 
+h 2.2 + +s mo Dor vt 





COS 6H OHS OO EOS OS SEH 6 OSE COEF CHOC 1S CREO HDESD SCE GEESE ES 4 HSH OWES.6 CCH EH OHS 


Lecter e eer ec ee eee a ee a eee ee ee ee ee ee ee ee ee se ee ee ee ee ee ee ee ee ee ee 


a; 2; 2° + (x a) (D2) +b; Dia + ( 


+s) m; 22" + (= m) (Li 2") = Livy — Lewy 


+e? +i 2d + --« mez, , 


CR ee MRED SMS E ECE HREE EDA OPED EREC CBOE HK ET ED CDRS EA DE RD ED OBVIATE OOS 








2s meaning a summation extended over the 7-th group. As (1) is of the 
s-th degree, we have (s + 1) (r — 1) parameters to determine and as many 
equations, the problem thus being in theory solved.* As to the numerical 
solution, Doolittle’s method or any other may be applied. We do not enter 
at present the question, how much labor would the actual solution require. 

Examples. Allen and Bowley in their book on ‘Family Expenditure” 
(London, 1935) assume the expenditure on some defined item f to be a linear 
function of the total expenditure e 


f=ke+e. (6) 


Evidently >> k = 1, Doc = 0 (efr. pp. 10-11). Another example I give in a 
paper on seasonal variation, which appeared in ‘Economic Studies’ III 
(Krakéw). Actual values y of a time series are assumed to be linear functions 
of certain “normal” values x 


y=a+bez 


a and b changing from month to month but constant from year to year. Then 
Da =0, 2b = 12. 





§2. Methods of solution in special cases. The generally recognized methods 
of solving normal equations become extremely laborious as the product (s + 1) 
(r — 1) grows large. As a matter of fact, the amount of computer’s work is 
approximately proportional to the cube of the number of parameters to deter- 
mine. Therefore short cuts seem to be indispensable. A most elegant one is 
at our disposal in the special case’ when the values of x in the several groups 








* The remaining s + 1 parameters a,, b,, --- mare, of course, found from (3). 
1 This seems to be realized in Allen and Bowley’s work. 
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are identical, or, at least, the sums n;, zs £, Zs a - az” are identical 
ini. Instead of (1) we shall write 


yi = Ar: + BiXi+ --: MX, (8) 


where X,, X2, --- X, are orthogonal polynomials, i.e. such that > X;,X; = 0 
if and only if i ~ 7. In general, X, = X* + aj_1 X"' + --> ad, the coefficients 
being rational functions of n, a z, > r,s: Zz gt, 

The conditions (3) can now be replaced by a set of equivalent ones, viz. 


2 Ai = Ao 2d, Bi = Bo -+- 2, Mi = Mo. (9) 
How the actual values of Ao, Bo, --- Mo are found, will be shown in the next 
paragraph. The solution becomes now very easy, as the normal equations 
for the determination of each set of r — 1 parameters are independent, i.e. we 
can calculate the A’s separately, then the B’s etc., the order of solution being 
of no importance. Moreover the shape of the normal equations permits of 
considerable simplification of solution. Suppose we have to determine the 
values of the coefficients K, corresponding to X;,. The normal equations are 
now—after certain simplifications— 


G~ehemhsa+<Ran Er. s (Da Xiy — Le Xiy) + Ko 
“ih 


K,+ 2K,.4+ K34+-°---Ku= “ye (2s Xiy — Lor Xiy) + Ko 


K, + Ke + K3 + --- 2K,-4 - (Dora Xny ams ) Xny) + Ko. 


ox 


Adding these equations, dividing the sum by r and substracting the quotient 
from the j-th equation, we get 


i we Dei Xny _ K 
Kk, « = - S ; 11 
; > iG r\ a , i - ata 


The first member of the right hand side of (11) should be regarded as the 
principal term: this is actually the value we would obtain for K;, were this 
coefficient independent from the other K’s. The second member is a correction 
term, the necessary amount of correction being distributed equally among the 
several K’s. The simple solution given by (11) is only possible if the sum 
> X} is the same for each group. From the definition of X), we see that it 
is equivalent to saying that n;, 0:2, Dit, -** Dia” be identical in 7.. As 
h increases to s, we come to the condition given at the beginning of this para- 
graph. 
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§3. If this condition is not fulfilled, we can, indeed, replace the power series 
in z by orthogonal polynomials X;,.;, the second subscript being appended 
in order to show that the values of the X polynomials are no more identical 
for the several groups; these polynomials are now orthogonalized separately 
within each group. But we are no more able to predetermine the values of 
Ao, Bs, --- Mo, as they depend on each other; this will be made clear a little 
later. Therefore we have to resort to an approximation: the values of the 
parameters will not be found from simultaneous equations, but successively, 
step by step, beginning with those corresponding to the highest degree of the 
independent variable. 

The values of ao, bo, «+: mo are given. It is evident that m = Mo. The 
j-th normal equation is now: 


r—1 


M; a4 eu —_ My a ate (= M.) (7.x. = i a ‘ied Dor Xe-i- (12) 
We see at once that 
2 ° ° ‘g ° — : v o! 
ay = MiduKis + 2a Sew 2 Xe (13) 





Inserting this into /12/ we get 





— Mo 


Doi Xeiy 
2 
« Btw 1 Se 


Rhos Rakes - y, X?, 


(14) 








The second member of the right hand side of /14/ is again a correction term, 
the necessary amount of correction being distributed in inverse proportion to 
>; X?.;.. Now we determine the value of Lo, this coefficient corresponding 
to s — 1, the second highest degree of x, and calculate the several L’s from 
equations strictly aualogous to (14) thus accomplishing the second step of our 
work, and so on, down to the A’s. Jp is found from the following equation: 


r 


Lo = ly — D [as-a(i) - Mi). (15) 


1 


To a;_; is now appended a bracketed 7, this to stress its variation from group 
to group. We see from (15)’that before the several M’s are calculated we are 
not in a position to determine LZ). On the other hand, if a{_; is the same for 
all groups, the second member of the right hand side of (15) simply reduces 
to a$_1-mp and Ly can be determined in advance, i.e. before calculating the 
M’s. This is the case treated first (in §2). In any case, if no definite corre- 
lation is to be expected between a{_i(7) and M;, the approximative method 
developed here should give very nearly correct results. The writer applied 
this method of solution to the simple problem of seasonal variation mentioned 
in §1 and found the results very satisfactory. 








A SIGNIFICANCE TEST FOR COMPONENT ANALYSIS 
By Paut G. Hoe. 
1. Introduction 


During the last few years several papers and books have been written on 
various aspects of what has been termed component or factor analysis. This 
analysis has arisen from the psychological problem of describing the results on a 
series of tests in terms of a few distinct abilities or components. In much of 
such work it is claimed that there does not exist more than a certain number 
of components, the material discarded in order to substantiate such a claim 
being considered as due to random errors of sampling or errors of measurernent. 
However, mere inspection of results or the calculation of standard errors of 
residual correlations is hardly sufficient to justify such conclusions, and there- 
fore a significance test of some kind is necessary. Hotelling’ considered such 
a test but based it upon an uncertain analogy with the analysis of variance 
and upon the legitimacy of using standard errors. The purpose of this paper 
is to derive a test which is more general in scope and in which all assumptions 
are explicitly stated. 

If each test score is thought of as being made up of two parts, a true score 
and an error element, the assumption that there exists fewer components than 
the number of tests implies that the scatter diagram of the true scores will lie 
in a space of correspondingly smaller dimensionality. Consequently, an ideal 
test for the number of components would be one which would test the rank 
of the true moment matrix. In the case of normally distributed variables, 
this line of approach leads one to the sampling distribution of the generalized 
variance. Unfortunately, this distribution appears in unintegrated form; how- 
ever, by considering its moments it is possible to find a good approximation 
to this exact distribution for samples which are not too small. 

The paper proceeds by first finding two approximation distributions for the 
generalized variance, one for samples which are not too small and one for large 
samples. It then considers the type of population from which it will be assumed 
the sample was drawn, and finally applies the test to two numerical examples 
from recent literature along such lines. 


2. Approximation Distributions 


Suppose that N individuals have been drawn at random from an n variate 
normal population whose distribution is expressed by 


n 
—D Aijpzizj 


(1) P(a, %2,°**, Xn) = Ke 


1 Harold Hotelling, Analysis of a Complex of Statistical Variables into Principal Com- 
ponents, The Journal of Educational Psychology, September and October, 1933, pp. 21-25. 
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where x; = X; — m;, Ai; = , Ais the determinant | p;;| and A,;; is the 





















— Gj 
cofactor of p;; in A, and K = | A,;|*/(2x)"". If the observed values of the 
variables of the ath individual are denoted by Xia(t = 1, 2, --- , 2), then the 
generalized sample variance is defined as z = | a;;|, where a;; = 1s (Xia 


(Xj2 — X,). Wilks’ has shown that in sampling from a anita 2 
the kth moment of the sampling distribution of z is given by 


(WBE (eB=2).. (UB) 
M, ioe A i 2 _ X. 2 ee: ; 2 


rae) 


where A = N”|A,;|. An inspection of the integrated form of the distribution 
of z in the case of n = 1 and n = 2 suggests that there likely exists a function 
of similar form for higher values of n whose kth moment can be made to differ 
from M;, only in higher powers of terms which contain N™ as a factor. An 
investigation along such lines leads to the function 














(2) g(z) = C2” cial 


where C = a* n ee -— a mans = Aq and q= 5 «e (n - ——— ia 2) 


N —n\’ 2 
r 
(n 9 ) 


It will be shown that the kth moment M;, of g(z) differs from M;, only in terms 
of magnitude less than the second and higher powers of k’n/N or kn?/N. 

Multiplying g(z) by 2‘ and integrating over the entire range of z will yield 
M;., which turns out to be 








a® ot ae 
Mw 


k nky N = & 
an Tin —— 
2 


Upon reducing the upper gamma function and performing successive steps of 
simple algebra 


, .» ——l N-04 N - —n+ 2k N—n 
M,=a‘n ¢ a — 1)» -2) cee ¢ 9 ) 
nvato( 4 Qk - +" z "(i > 2k - _ — 4 =) os 


(1 4! 2k - —n —) 


2S. S. Wilks, Certain Generalizations in the Analysis of Variance, Biometrika, Vol. | 
XXIV, 1923, p. 477. 
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The terms in parentheses may be treated as the factored form of a polynomial 
2k —n — 2/n 


—s etc., may be 


treated as the zeros with signs changed of the corresponding polynomial in 
z (say). As a result, the successive terms after the first in the non-factored 
form of this polynomial in unity are the sums of the products of these quantities 
taken one at a time, two at a time, etc. Upon performing this multiplication 
and letting ¢ = N"/2"A, M;, assumes the form 


tia 
Mi = otqt|1 Me ED | 


where the neglected terms are in magnitude less than the second and higher 
powers of k’n/N or kn’/N. If M;, is handled in exactly the same manner, it 
will be found that 


ay = a (Ett 1)... (AEB t_5).. 


(P+ Ros 1)---(2ER—* - 2) 
2 i 2 


—_— nk —k —nk 2k — 3 ee a. eee 
= N™A*2 (1+ 7) (1 7 


a |i _ n(n — 2k + 3), | 





of the nkth degree in unity. Thus the quantities 














2N 


where the neglected terms are of the same order of magnitude as those neglected 
in the approximation to M;. Before a comparison of M; and M, is possible, 
the factor q“ of M; must be expanded and multiplied into the quantity in 
brackets. This operation yields the result 


r —" 
My = ¢'| 1 — = eo + |. 








Thus M;, and M; agree to within neglected terms. As a matter of fact, if 
the values of the neglected terms are considered more carefully, it will be found 
that the actual difference between M, and M;, is considerably less than the 
given upper bound for the magnitude of neglected terms would indicate. For 
example, when n = 5 the first term in the difference is 6k(k — .9)N~’, while 
625k°N * or 25k‘N * is the upper bound for this term when only general results 
are used. The general formula for the first term in this difference has been 
obtained, but since the remaining terms have not been investigated and since 
the type of problems to which the distribution g(z) is to be applied does not 
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justify this refinement, it will not be considered here. Consequently, if one 
considers this distribution function as sufficiently determined by its low order 
moments and if one applies g(z) only to problems in which N is fairly large 
compared with n”, then the function g(z) will give a good approximation to the 
exact sampling distribution of z. Obviously, g(z) is identical with the exact 
distribution for the known cases of n = 1 and n = 2. It is not possible under 
the above expansions to vary the constants in the form of g(z) in such a manner 
as to obtain an approximation whose kth moment will agree with M; to within 
still higher powers of comparable terms. 

In order to test whether or not a sample value z = Z can be reasonably 
assumed to have been obtained in random sampling from a population of type 
(1) with fixed A, it is necessary to calculate the probability P of obtaining in 
repeated samples a value of z greater than Z. Thus it is necessary to evalua te 












P= 1- [eae 


N-—n 
5 — 1 and 


~ at n anew 
% = n/a n® : ) = nny ZI a= = = [2n(N — n)} 3, this 


integral can be reduced to the standard form of the incomplete gamma function. 
Hence P assumes the form 


(3) 


where 





Upon making the substitution r = n~v/az, and letting p = n 

















P=1 — I(u, p) 

















I P) 1 ne -2 Pq 
U, = Tip +1) ; €@ ZX az. 


In many applications of this distribution it will be found that the values of 
u and p lie beyond the tabled* values of these constants. Consequently, it 
will often be sufficient to use the normal distribution to which the gamma 
distribution tends as N becomes large. This normal distribution will be 
considered next. 

Rather than obtain a normal approximation to g(z) or the gamma function 
to which g(z) reduces after the above transformation, it is more illuminating 
to find the basic descriptive parameters of the exact distribution of z and from 
them obtain a normal approximation. Such a procedure will show how rapidly 
the distribution of z approaches normality with increasing N. By using the 
recurrence formula connecting M;4; and M; , which can be found directly from 
the ratio of these two moments, and expressing the necessary moments in 


3K. Pearson, Tables of the Incomplete Gamma Function, Biometric Laboratory (1922), 
Univ. of London. 
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terms of M, , it can be shown that these basic descriptive parameters are expres- 
sible in expanded form as follows: 


n(n +1) . n(n+ 1)(n — 1)(8n + 2) 
m= [1 - an + 24 N? $e] 


2 _2[ 2n —n(2n” —n + 1) | 
é = o'| NN? + 


_ 23n—1)[,  (m+1)n—-3) , 
=“ nN E 2(3n — 1)N + | 


fy ws 3|1 + 4(3n a or ‘|, 


These values suggest that 


“ VE 


will likely be distributed approximately normally with zero mean and unit 
variance. Asa matter of fact, by using the second limit theorem of probability, * 
it can be shown that the distribution of w approaches normality as N increases 
indefinitely. Hence, for samples in which N is large compared with n’, it 
will be sufficient to compare the value of w arising from a sample z = Z with 
its variance of unity if a test of significance is desired. A better general ap- 


proximation could have been obtained by centering the curve at ¢ E — a+ D)| 


2N 
rather than at ¢; however, since there is positive skewness and the true mean 
lies between these two values, there might arise some exaggeration in a signifi- 
cance test in doing so because the accuracy of such a test depends upon the 
accuracy of the approximation in the right hand tail of the curve. 

Inspection of (3) and (4) shows that the only population parameter upon 
which these approximation distributions depend is ¢. There are no assump- 
tions necessary about the population means, or variances, or covariances, 
except in so far as they may be related when the value of ¢ is postulated. This 
means that either (3) or (4) enables one to test whether or not it is reasonable 
to assume that the sample variance z = Z arose in random sampling from some 
normal population with ¢ equal to the postulated value. 


3. Population Assumptions 


Consider the set of variables u;, ue, -++ , Un distributed according to the 
normal law 


n 
—D dbijusu; 


(5) P(u, U2, +++, Un) = Kye * 


‘See, for example, Frechet and Shohat, A Proof of the Generalized Second Limit 
Theorem in the Theory of Probability, Transactions of the American Mathematical So- 
ciety, Vol. 33, (1931), p. 533. 
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and the set of variables v; , ve, --- , vn distributed according to the normal law 





n 
—> civ; 







(6) P(v,, v2, °** Un) = Kee 


where the v’s are uncorrelated with the u’s and with each other. The joint 
distribution of the w’s and v’s is expressed by 


n n a 
—Z bijusuj—-Z cir; 
1 


(7) P(ur, +++, Un) = Ksze 
Upon writing down the determinant of the coefficients of these 2n variables, 
it will become evident that any one of its principal minors of any order can be 
expressed as the product of a principal minor of | b;; | with a principal minor of 
| c; |. Since the distributions (5) and (6) are normal, the determinants | },; | 
and | ¢; | are positive definite; consequently the determinant of the coefficients 
in (7) must also be positive definite. 

Now consider the orthogonal transformation 
















Since the determinant of the coefficients in (7) is invariant under an orthogonsl 
transformation, the resulting distribution of the y’s may be expressed by 

















2n 


—Z dijvin; 


(8) Py, Y2, ae Yon) —_ Kye . 
where | d;; | is positive definite. 

In order to obtain the distribution of the variables y,, y2, --*, Yn, it is 
necessary to integrate (8) with respect to the variables ynii, --* , Yon Over 


their range of values. If this integration is performed after the quadratic form 
in the exponent of (8) has been expressed as a sum of squares’ with coefficients 
which are the ratios of principal minors of | d;; | , it will be clear that the inte- 
gration leaves a quadratic form in the exponent which is also positive definite. 
Hence after the transformation 2; = +/2y.(i = 1, 2, --- , n) the distribution 
function of the variables x; =u; + v:(¢ = 1, 2, --- , n) must be normal and 
may be expressed by (1). Thus it has been shown tha‘ if the true parts u; 
of the variables x; are normally distributed without error and if the error parts 
v; are normally distributed but are uncorrelated with the u; and with each 
other, then the variables x; possess a normal distribution. The advantage of 


5 See, for example, Risser and Traynard, Les Principes de la Statistique Mathematique, 
1933, p. 225. 
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this formulation will become evident when the parameter ¢ is expressed in 
terms of the paramcters of (5) and (6). 

Since the v’s are uncorrelated with the u’s and with each other, the variance 
o, of x; is the sum of the variances of u; and v;, while the correlation p;; be- 
tween x; and x; may be expressed in terms of the correlation p;; between u; 
and u; and the variances uj, uj, vi, v; of us, u;, vs, v; respectively. These 
relationships are 





/ 
Pij ‘ . 
(9) of =m+%, and py = ————___ (i ¥ j). 
Vl + vi/ui) (1 + ¥3/u) 
For simplicity of notation let \; = v:/u;. Now it is well known® that @ can 
be expressed in the form 


@ = 0102 --- on | pi; |. 


If the values from (9) are inserted in | p;; | and if the resulting denominators 
of elements are factored out, @ will assume the form 


2 2 2 
0102 °°: o,B 


*" G+) -- G40) 


where 


Following the methods of confluence analysis,’ B can be expressed as follows: 


B=R+ Dra Rac “4 Ly rads Ryaa 4oeee tide? Ay 
a= a< 


where R = | p;; |, Rya is the principal minor of R obtained by deleting row 
and column a, etc. R is the true correlation determinant whose rank it is the 
object of this paper to test. If R is assumed to be of rank n — ¢, then all 
principal minors containing more than n — ¢ rows vanish and B reduces to 


Ban 2, Ragan *** Rep Mbasen--snet  *** 4 Dalle" Bee 
ar<*<ay 
The tests (3) and (4) were designed to test hypothetical values of ¢ by means 
of the sample Z. Evidently the value of ¢ can be postulated by assigning 
hypothetical values to the d’s, the o’s, and the principal minors of R. 
Assigning values to the \’s does not curtail the degrees of freedom in these 


6S. S. Wilks, loc. cit., p. 477. 


7 Ragnar Frisch, Statistical Confluence Analysis by Means of Complete Regression 
Systems, Oslo, 1934. 
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tests because they were derived on the basis of (1) which depends only on the 
m’s, o’s, and p’s. The X’s do restrict the range of the p’s, but not their degrees 
of freedom. 

An inspection of the expression for ¢ shows that @ can be made to assume 
any desired value irregardless of the rank of R by merely assigning the o’s 
properly. It is therefore necessary to make some assumption regarding the 
o’s if the test is to serve the purpose for which it is intended. Here it will be 
sufficient to assume that the product of the population variances may be re- 
placed by the product of the sample variances. This assumption will ordinarily 
be approximately fulfilled for the size samples for which it is legitimate to 
employ (3) or (4); consequently this assumption does not restrict the range of 
application of the test. 

To postulate values of the principal minors of R beyond postulating the rank 
of R would introduce hypotheses and restrictions which are irrelevant to the 
fundamental purpose of the test. This difficulty will be avoided by replacing 
all non-vanishing minors of R by their upper bounds of unity. Since this 
will overestimate the value of B, and hence of ¢, the usual significance level of 
.05 may be considered as decisive. Let the value of B when unity is inserted 
for all non-vanishing principal minors be denoted by D. Then 
















n 


(10) D= Z Na, Nag *** Nay + °°* fH Aide? An. 


ay<***<es4 


















Since 
TL +2) = 14+ Deret De Rader + 02+ + MAL An 
a=1 ali<ae2 


it will often be convenient to write D in the form 


a1) D=Tatn-{14 Date $ Zz Dein Ranag . 


arate 


As a consequence of all the above assumptions, 


Z_ \aij|_ (1+) ++: (14+) | ii 


o od B 
> (1 +1) - ++ (1 + An) | ris | 
= / D 


(12) 





where | 7;; | is the sample correlation determinant. 

All the essential material for testing the rank of the true correlation matrix 
is contained in (3), (4), (11), and (12). In summary, the hypothesis to be tested 
and the procedure to follow in performing the test are as follows. 

The population of n variables from which the sample is supposed drawn is 
assumed to be such that (a) the true parts of the variables are normally dis- 
tributed, (b) the error parts are normally distributed but are uncorrelated 
with the true parts and with each other, (c) the product of the variances may 
be replaced by the product of the sample variances, (d) the values of the \’s 
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are postulated as judged by the accuracy in measurement of the variables, and 
(e) the rank of the true correlation matrix is n — t. 

Given the value | 7;; | of the sample correlation determinant, a lower bound 
for the value of Z/¢ is calculated from (11) and (12). This lower bound is 
inserted in either (3) or (4), depending on the size of the sample. If (3) is 
used and if P < .05, or if (4) is used and w 2 2, one may conclude, as judged 
by the sample variance, that it is very unlikely that the sample was drawn in 
random sampling from the population specified above. If one has reason to 
believe that the variables are sensibly normal as indicated above and that the 
postulated values of the \’s are quite accurate, then the test shows quite defi- 
nitely that the postulated rank of the true correlation matrix is unsubstantiated 
by the sample, and therefore a higher rank should be tested until a non-signifi- 
cant value is obtained. Because a lower bound rather than the value of Z/¢ 
is used, the test can be used on minimum ranks only, and hence a value of 
Z < ¢ will not yield a test of significance. However, the test does handle the 
problem for which it was designed and which is of fundamental interest, and 
that is to see whether or not one is justified in assuming that a sample repre- 
sents only a certain minimum number of components. 


4. Applications 


(a) Hotelling® has used an example taken from other sources to illustrate 
his test on components. In order to compare results, this same example will 
be treated here under the assumptions outlined above. In this example the 
reliability coefficients are given. From the definition of a reliability coefficient 
r;, it follows at once that r; = ix The population values of the d’s will 
be set equal to the values obtained from these sample reliability coefficients. 
The data for this problem are 


Ir:;| = .235, N = 140, n = 4, 1. = .087, Xe = .119, As = 101, 4 = .773. 


Assume that the true correlation matrix in the population is of rank two, that 
is, that two components are sufficient to describe the results on these tests. 
Since N is large compared with n’, it will be sufficient to use (4). The values 
of (11), (12), and (4) are found to be 


D= IU +m) -{1+ Eds} = 204 


Zs IT +3) | ris —] 
o- D 


ox l= [1.90 — 1] = 3.76 


.90 





® Loc. cit., p. 16. 
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Since the standard deviation of w is unity, this value demonstrates clearly 
that the hypothesis of only two components is untenable as judged by the 
sample correlation determinant. If one assumes three components, the test 
will be found to yield a non-significant value. Hence it may be concluded that 
under the hypotheses on which the test is based, the sample does not justify 
the assumption of less than three components. Hotelling’s test indicated the 
necessity for two components but was uncertain about the third, the decision 
resting upon a variate value of 1.31 as against a standard deviation of unity. 

(b) Thurstone, in his ‘Vectors of Mind,” considers an example taken from a 
series of fifteen psychological tests. After applying his centroid method to the 
data, he inspects his results and concludes that four components are sufficient 
to account for everything except random errors. It is impossible to test his 
conclusions explicitly as above because the size of the sample is not given and 
the reliability coefficients are not known. Nevertheless, if it is legitimate to 
assume that the sample is sufficiently large to justify the use of this test, in- 
teresting conclusions can be obtained on the assumption that only four com- 
ponents are needed. 

Suppose that \; = 3, which implies that the variance of error is half as large 
as the true sampling variance for each variable. Here (10) is more convenient 
than (11) for computing the value of D. The values of (10) and (12) are 
found to be 


D = 15C3(4)" + 15Co(3)* + 5C1(3) + (4) = .125 


Zs |rii| 


¢@ .0003° 


Evidently, the value of | r;; | must lie in the neighborhood of .0003 if the test 
is not to yield a significant result which contradicts the hypothesis. However, 
the correlations in | r;; | are given to only three decimal places, and therefore 
a legitimate value in the neighborhood of .0003 can not be realized. It is to be 
noted that the postulated values of the \’s are equivalent to postulating that 
all reliability coefficients are equal to }, a value which should be considered as 
unusually low. It would seem reasonable to avoid using material in which the 
variance of error is larger than one-half the variance of random sampling, unless 
the variance of random sampling is exceedingly small. 





CONTRIBUTIONS TO THE THEORY OF COMPARATIVE STATISTICAL 
ANALYSIS. I. FUNDAMENTAL THEOREMS OF 
COMPARATIVE ANALYSIS’ 


By Wiiuram G. Mapow 


This is the first of several papers in which there will be presented a general 
approach to the statistical examination of hypotheses which are false if any of 
several things are true. Phenomena requiring such a statistical theory are 
investigated quite frequently. As examples may be cited the studies of lag 
correlation in time series, periodogram analysis in geophysics, factor analysis 
in psychology, and analysis into components in agriculture.’ 

The theorems of this paper have one purpose: to permit the reduction of the 
distributions by which the hypotheses are to be tested to essentially the joint 
distribution of the statistics which contain the information offered by the data 
concerning the truth or falsity of the things which will negate the hypotheses. 
In order to do this it has been necessary to generalize the theorém of Poincare 
on the probability that at least one of several events occur.’ As illustrations 
there are stated, after Theorems III, VI, and IX, generalizations of a distribu- 
tion derived by Jordan, (5) page 109.* 

In a second paper, we shall give a complete derivation of the joint distribu- 
tions necessary for the applications of the analysis of variance. A reconsidera- 
tion of the Schuster periodogram will be included. In other papers these 
results will be extended to problems arising in the theory of regression, and to 
problems of the distributions of medians, etc. 

The fundamental theorems of comparative analysis are now obtained in such 
a form that they are applicable to problems in the theory of probability no 
matter what the distributions may be. Some special cases of these theorems” 


1 Presented to the American Mathematical Society, March 27, 1937. Research under a 
grant-in-aid from the Carnegie Corporation of New York. 

2 Naturally these techniques are also useful in other branches of science then those in 
which they were first applied. It should be noted that by analysis into a we 
here refer to the work of Fisher, (2), chapter 6. 

’ See, Poincaré, (7), page 60. This theorem is attributed to Poincaré by haiti (5), 
and Fréchet, (3). 

‘This distribution states the probability that in r trials of an experiment which has 
exactly n possible results, these results being mutually exclusive, each of the possible 
results occurs at least once. Jordan’s derivation has been simplified by Fréchet, (3), 
page 12. 

5’ The theorems are, of course, part of the theory of measure and integration. 
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have been used in connection with the derivation of distributions of positional 
statistics such as the k*" in order of N elements,’ and others. 
Let 2 be a collection of elements z, and let A be a set of subsets of 2. Then, 
the axioms which the elements of A are to satisfy are’ 
I. Aisa field; 

















II. Qe A; 
III. To every A « A there is ordered a non-negative real number P(A); 
IV. PQ) = 1; 


V. If AeA and Be A, and AB = 0, then P(A + B) = P(A) + P(B). 
We shall regard © as the set of possible results of an experiment ¢«. By events 
we shall mean elements of A. The complement A of A with respect to Q will 
be an element of A if A is an element of A. A consists of all elements of 
which are not elements of A and hence is the event which occurs if and only 
if A does not occur.’ 
Let the subsets of 2 


(1) 


be elements of A. Then, if a, ae,--- 
the set 









E, , Eo, --- , E& 






,@. 1S a permutation of 1, 2, --- ,k, 











(2) E., Eu -:- Re Mie —s 





is an element of A and is the event which occurs whenever all the events 
Ea, ,Ea,,-+-,Ea; occur, while none of the events Ea,,,, Haj4,,°-: , Eu 
occur. 

The events (1) are said to be independent if and only if 


j k 
(3) P(E, sk Ea; Bajs1 re E.,) — I P(Ez,)- I] P(Be,) 
v= y=j+ 
for all selections of the sets (1) and their complements.” 
Theorem I. The probability that the first j of the k events (1) occur, while the 
remaining k — 7 events do not occur, is 












6 See, for example, Gumbel, (4). It is noted that Theorems I, II, and III are stated by 
Arne Fisher, (1), page 42, who assumes, however, that the events are independent. 

7 These axioms are stated by Kolmogoroff, (6), page 2. 

8 A set of sets is a field if the fact that A and B are elements of the set implies that 
A+ B,AB,and A — AB are also elements of the set. 

® The event A will be said to have occurred if the result of the performance of the experi- 
ment E is an element of A. 

10 See Kolmogoroff, (6), page 9 for a discussion of various equivalent definitions of 
independence. 
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k 


k-j 
(4) P(E, +++ E; Ej. +++ Ey) _ d (-1' ) Ph ce E;E., ++ E,,)." 
aycazc:'<ay 


Proof. Letk =j3+ 1. Then it follows from Axiom V that 
(5) P(E, E,--- Ej) = P(E, E2--- E;Ej1) + P(E:F2 ++ Ej; Bix). 
Hence the theorem is true fork = 7 + 1 andanyj > 0. Let the theorem be 
true fork = 7,7 +1,---,k—1. From Axiom V it follows that 
(6) P(E, --- Ej; Ej. --- Ey) 

= P(E, --- E;Bji1--- Bya) — P(E, --- Ej EB jas --- BesEx). 

Substituting from (4) the theorem is proved. 

Letn >m+---+u,ni > 0% = 1,--- ,t); and let 


n! 
m! me! +-+ m!(n —m — +--+ — m%)! 





= (n; m1, Ne, aa m). 


Corotuary. If, for each value of v, (v = 1, 2,---,k — j), the (k — J; v) 
terms 


| —— oe 


v 


which can be obtained by selecting a1, a2, --- ,a@, without repetition from 
j+1,j + 2,---,k, are all equal, then 


a 
(7) P(E, «++ E;Ej41 +++ BE, = 2d (=k - j;v)P(E, «++ Ejs,). 
Let 
k 
(8) Se) = Dd P(E. Ea, +++ Eo,) 
1°" *,ay=l 
ar<***<ay 


where the summation extends over the (k; v) terms 
(9) P(Ea,Ea, --* Ea,) 


which can be obtained by selecting v of the k events (i) without repetition. 
If all the terms (9) which can be obtained by selecting v of the k events (1) 
without repetition are equal, then 


(10) SQ) = (k; »)P( --- E,). 


11 By definition 
kj k P 
> (-1)” 7 P(E, «++ E; Ej41 +++ Ea,) 
v=0 @1,°**,ay=jt+1 
ar<'**<ay 


k-j k 
= P(E, ---E;))+ >) (-1) >) P(B «+ Ej Ea, «++ Ea,). 
v=1 


a1. '*,ay=jtl 
ar<i'<ay 
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Theorem II. The probability that exactly j of the k events (1) occur is 
kj 
(11) Pi) = dX (—1)"(j + v3 ») SG + »). 


Proof. If A,j is the subset of @ defined by the requirement that exactly j 

of the events (1) occur, then A,;) is the sum of (k; 7) disjunct sets: 
k 
(12) Aw = yo Bas on Ri Bais, para Russ 
oa: ane 

where aj41, --- , a have those of the values 1, --- , k which remain after the 
selection of a1, ---,a;. By Axiom V we may replace A by P in (12). Upon 
substituting from (4) we note that the resulting terms of (12) which depend on 
the same number », v = j, --- , k, of events have the same sign, that all S(), 
vy = j,---,k, occur, that no term depending on fewer than 7 events occurs, 
and that any particular P(£.,K, --- Ea;,,) will occur in those of the terms 
of (12) the 7 occurring events of which are a subset of Ea, , Ea, ,--- , Ea;s, 
and will occur in no other term of (12). Hence the coefficient of S(j + ¢) in 
(11) is (—1)‘ (j + t; #). This completes the proof of the theorem. 

Corouuary. If (10) is true for vy = j, --- ,k, then 


kj 
(13) Pw = de (= 1)" (si, v) P(E, EB, +++ Ej,,). 


Theorem III. The probability that at least j of the k events (1) occur is 


k-j 


(14) P® = 2. (-1 G+ r- 1;v) S(j + v). 


v=0 


Proof. If A” is the subset of 2 defined by the requirement that at least j 
of the events (1) occur, then A” is the sum of k — 7 + 1 disjunct sets: 


(15) A? = Avy + Avian +--» + Aw. 
By Axiom V we may replace A by P in (15). Substituting from (11) 
kj 
(16) P® = do¢S8G 4+»), 
v=0 
where 
o=(Ftryjtry—GtyIt+---+(-)' G+»), @ =0,---,k—)%). 
It is easy to prove that 
(17) (-I)"GG+»—-1;») = x (—1)°"G + 439 + 2). 


Coro.tuary. If (10) is true for vy = j, --- , k, then 


k-j 
(18) P® = dX (—1)'(j +» — 1; (kh; j +») P(E. Ee + Ej). 
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To provide examples illustrating these theorems let us consider r experiments 
(19) E® E® ade E” 

j ? ’ ’ 
Let E have k mutually exclusive outcomes 
(20) a, ««+ a. 
Then, it is easy to define the spaces 2, A® the probability function P;(E“”), 
the combinatory product . 

2=0” x a® x... x a, 


the set A and the probability function P(E) so that Axioms I, --- , V are satis- 
fied and hence Theorems I, II, and III are valid. 

We shall assume that the experiments (19) are independent. 

Let _ 

0; (j = 1,---,k) 
be the event which occurs when neither O$” nor O{ nor --- nor O§” occur. 
Then O; occurs if upon performance of the experiments (19) at least one of 
OF”, OF" , --- , OF" occur. 

It is an immediate result of the definition of independence that 


(21) P(O., O02, ai ) a I] {1 ine P(O%)) eon P(O%})}. 


From Theorem I, the probability that O, , O2, --- ,O; each occur while not 
one of Ojs1, Ojs2, --- , Ox occurs is 


2 _ 7 7 
PO, +++ O;Oj41 +++ O,) = dX (-1)" y 


ee | 


(22) arc <ay 


r 


(1 — P(OS?,) — --- — PCO”) — POS) — --- — P(O®)}. 


i=l 


From Theorem II, the probability that exactly j of O1 , O2, --- , Ox occur is 


(23) Po =D (-Wk - 5 + 5 Se -F +0, 


where 
k r 


Sk—-j+y») = Zz II (1 — Pos?) — --- — POS?_,,,)}. 


%1,09,°°',ak—-jty=l i=l 
ALL agK*** KL ak—jty 


Since the probability that at least 7 of O, Oz, --- , Ox occur is equal to 1 
minus the probability that at least k — j7 + 1 of O,, Oo, ---,O; occur,” it 
follows at once from Theorem III that 


P{at least j of O1, ---,O, occur} = 
(24) j-1 
1= Le (-Y"k — f+ viS&-j+r+ DV. 


122 There are, of course, other ways of computing these probabilities. 
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The case treated by Fréchet and Jordan is that which occurs when we assume 
P(O{”) = P(O$), ( = 1, --- ,k), (Gi, h = 1, ---,r) and in (24) let j = 1. 
It is not difficult to obtain further generalizations of Jordan’s distribution by 
defining events which occur if and only if fewer than 7’ of r events occur and 
then proceeding as above. 
Certain useful generalizations of Theorems I, II, and III will now be derived. 
Let the subsets of Q 


(25) E\, Ey, ---, Egy — 


be elements of A, and let N = k? +h? +... 4k. 
Let j” < k®, (s = 1, --- , p); and let 


(26) Qo” = TI Te 


s=1 i=1 


Let 
ks) 


(27) Q@’=]T JT] & +++, D). 


s=1 i=j(s)41 
(s «(8 
Furthermore, let for each value of s, (s = h,---,p), the (k° — j®; ») 
e . ° ° (s) (s) «(s) 
possible distinct selections of v“’ of the k‘” — 7°” sets 
y(s) y(s) (3 

(28) U jla)+1, Lzls)42, woe, Extn 
be arranged in some order, and, if the intersection of the v‘” 
selection be denoted by 


q'*(v) 


sets of the 7,” 


(29) Gi, = 


let 
: ; th) (p) F te (s) 
(30) g*®y™, ..., 9) = TT gi). 


s=h 
Pp 


There are [] (k — j; v) sets (30), for each value of h, (h = 1,--- , P); 


s=h 
and any set of fixed values of v’, --- , v'”’. 
8s) (s) : rr: 
Let for each value of s, (s = k, --- , p) the (k“; v™) possible distinct selec- 
tions of v“ of the k sets 


(31) . a oe k, 


(p) 


‘ “ ° ° - th ° 
be arranged in some order, and if the intersection of the sets of the 7,"" selection 
be denoted by 


(32) i") 
let 


: (} ) F- 21 s) 
(33) gh, .--, 9) = TT ge). 


s=h 
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Pp 
There are [| (k“; v“) sets (33), for each value of h, (h = 1, ---,p), and any 
s=h 


h) ) 
set of fixed values of v”, --. ,v™. 


It is clear that the various sets that have been defined are elements of A. 
The fact that the sets are the events which occur if and only if certain sets of 
events occur is also too obvious to require further comment. 

Theorem IV. The probability that of the N events (25) the first j of super- 
script s occur and the remaining k“ of superscript s do not occur, s = 1, --+ , p, is 


kKQ)—jQ) 7(2)~7 (2) k(p)—; (p) 


P/Q? Q”’) eas =. >. ee > frre 


y(1)=0 p(2)=0 vy (P)=0 
(34) 


(k(1)—j 1) sy 1) ) (k(P)—j (P)sy(P)) 
ee 1 
Plq p(y‘ a y)). 
i,=1 tp=1 
Proof. Theorem I is.a proof of Theorem IV for p = 1. The theorem may 


then be proved either by regarding it as a special case of Theorem I and col- 
lecting terms, or by induction. 


: 1 2 
Corotiary. If, for each possible set of values of »™, »,..- ,»™ the 


’ 
P (s) (s), (8) 
II (ke —j* 7} 
s=1 
terms 
(35) Pig, «+ 9) 


are all equal, then 


kK) —j CL) k(p)—j (p) 


PQ” Q”’) = _ cases > Laci reer 


y(1)=0 v(P)=0 ‘ 


(36) 


I (k a ia v)Plq wee 
Let, for each value of h, (kh = 1, ---, p), 


(h) (h+1) (p) 
a oe 6st oD 


(37) (kh); (h)) (k(P):y(P)) 


Pig?” ere SP ar y)], 


i,=1 ip=l 
It is apparent that by using (34) it is possible to obtain an expression for (37) 
which does not depend explicitly on Q“”’. In fact 


R)-~j;Q) k(h-1)—7 (h-1) 


ce —1) 
Si, ned py?) ae = ihe > <r +y(h—-1 


y(1)=0 py(A-1)=0 


(38) (k(1)—7 (1)3y(1)) (k(A-1)— 7 (A-1)sy(h-1)) (RCA) sy (A)) (k(P) sy (P)) 


i}=1 ip-1=1 i,=1 ip=l 


Pi *"1G", rok yo) gir . ‘tay 
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If the different terms of (37) are all equal, then 
p 
(39) a. hth y'?) -“ I (kK; “rq pgm. ee y)), 


If the different terms of (38) are all equal, then 
K-71) k(h—-1)— 3 (h-1) 


Si, heey y'?) a 7 eae 7 Lal sre 
y(1)—=0 y(h=Daxo 


h—-1 p 
(40) II (k — jv) Il (k®; y) 
s=h 


s=1 


exe ( (h—1)\ -1--- (h) 
Pl¢ vy wf" ia ”\@ _. ooo. 0™)] 


Theorem V. The probability that of the N events (25) the first j of superscript 
s occur and the remaining k® do not occur, (s = 1, ---,h — 1), and exactly _ 
events of superscript s occur (s = h, --- , p), ts 


k(h)—j(h) k(p)—j (Pp) 


Py... (Q*” ) = Zz wale >. arte 
p(h)=0 y(P)=0 


(41) 
+ yp. v) sj” + ad ae 7" ‘ y?), 


Proof. The theorem may be proved, either by induction using Theorem II, 
or by obtaining disjunct sets as in Theorem II and using Theorem IV. 
Corotuary I. If (39) is true for all sets of possible values of v, ..- , »™ 
then 
KA) aj (h) k(p)—j (p) 


Poin... (Qe Qe’) = eee DE (eye trit 


py(h)=0 vy(P)=0 


(42) 
(mem (h—1) Alh—1)" alee (h) (p) 
TT & 5 5°, 0) PIQe Qe G1, «+, ™). 


s=h 


Corouiary II. If (40) is true for all sets of possible values of »™, v® 
then 


5 eae 


k(1)—7 (1) k(p)—j (Pp) 


Pyw...j0(Q*?” or) = 7 — 2. (~ "ila it 


y(l)=9 y(P)=0 


43 8 «(s 8 n s) *(s 8 
( ) II (Kk al a y) I] (k“ 55") J DY 


s=1 


Pid, ny Sf” oe y?)). 


Theorem VI. The probability that of the N events (25) the first j events of 
superscript s occur and the remaining k“ do not occur, s = 1, --- ,g — 1, exactly 
j° events of superscript s occur (s = g,---,h — 1), and at least 7° events of 


superscript s occur (s = h, --- , p) ts 
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k(g)—j; (9) k(p)—j (Pp) 


Ady (Qo 1) Qs *) z mins >> ree 


vy(o)=0 y(P)=0 


Pp 
4 py. py) II Fa + y® ie 1; y) 


s=h 


SG +9, 2,9? +). 


Proof. The theorem may be proved either by induction using Theorem III 
or by obtaining disjunct sets as in Theorem III and using Theorem V. 
Corouiary |. If (39) is true for all sets of possible values of 


(g) (g+1) (p) 
yp ed e+e 


then 


k(o)—j (9) k(p)—j (p) 


Pi ene) = FS ... EF (~n"* 


v(a)=0 v(P)=0 
45 Ty] 7 » 8 Fi «(3 8 8 8 8 8 
( ) Il (kos 5, y' 4 I] [gj — i sv \(kO sj -(s) + y' ] 
s=g sak 


are e*G, vite y?)). 


Corotiary II. If (40) is true for all sets of possible values of vv, --- , v 
then 


KA); CL) k(p)—j (p) 
(h). 


Gar FayQeP Qe’) = De wee Qe (ayer tei 


y(l)=—o y(P)=0 


” TT A s(s) , TT de hd 8 . «(3 8) 8 8), s(8) 8 
- IT a? = 7°59 a — 15 eK 55? + ¥®)) 
= sg 3= 


PIgh ry, +2 VP G GY, oe 


Let us again consider the experiments (19), and let us assume that 
E’’, (« = 1, ---,7) has as its mutually exclusive results 


(47) a (¢ = 1,---,k”); (s = 1, 2). 


Let O. be the event which occurs if, upon performance of the experiments 
(19) at least one of the events Of), Of, .-- , O§? occur, and let O, be the 
event which occurs if and only if O,, does not occur. 

We may state the probability that the event #, , which occurs if and only if 
at least j°” of the events Ou, (t = 1, --- , k®) occur, and the event E2 , which 
occurs if and only if at least 7 of the events Ow, ({ = 1, --- , k™) occur, both 
occur. 

It is apparent that 


(48) P(E, E2) =l1- P(E) = P(E) + P(E, E:), 


where E; is the event which occurs if and only if E, does not occur, (s = 1, 2). 
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From Theorem III 


j(s)-1 


va P(E.) — 2 (De® al j° + y”-. “sa” _j” re ym r% 1) 


where 
k(s) 


nay -—j” + »” + 1) Sai 


@ye° OK (e)—7(ed4y(s)41=1 
(50) arc aRle)—7 (8) 4) (8)41 


I {1 on P(O4;s) aii? Peet gtrgcteragel (s — i, 2). 


From Theorem VI 


j(D—1 (2-1 


2 
50) PB) = ee (wr TL a? = 5 +950) 


p(l)=Q y(2)=0 oat 
med) «(1) (i 4 (2) -(2) (2) 
BP = fF? ge™ +1, 8° — 7” +o” 4 0), 
where 
(KO); 7 )—yQ)-1) (e020; 75 (2) (2-1) 
(9 


Sk a i pe yD i, Ke? og py) +1)= au 


ig=1 
Plgh2(k -j” + yd ‘ 1, je sal i 4 y? + 1)], 


and 


Pl? (kh —j” + y i 1, kh —j” + »™ + 1)] = 
r ( kOL)~j 1) 49 (1) 41 . 2) > 
I] - XR OD- PCO; 
i=l v=1 
the subscripts a,, (v = 1,---,k? — j@ + v® + 1), being those of the i." 
selection of Kk” — j@ + v® + 1 events from k" events, and the subscripts 
Bu, (u 1,---,k°? — j® +» + 1), being those of the ig" selection of 
hk? — j® +, + 1 events from k” events. 

The desired probability is then obtained by substituting from (49) and (50) 
into (48). The procedure is perfectly general, and applies directly to situations 
in which p > 2. ; 

We shall now investigate the results obtained by requiring that the events 
considered satisfy a relation of implication. 

Let the subsets of Q 


(51) Ey. , Eos, --- , Exe, ‘s 1,---,p), 
be elements of A, and let 

(52) E;.C Ex, pean Bh 
ifs < t. 
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It follows that 
(53) 
Let ji < jo < -:- 

















P(E;.Ei:) = P(Eis), (@@ = 1,---,k), (s < 2). 
< jt and let 

(54) Qa = I Mz.., (t = 1,2, ---,p). 
Let ji < jo S +--+ < je and let 


& 
(55) a2=I1 I E.., (t= 1,2, ---, p). 


s=1 i=j,+1 


From (52) and (53), it follows that 


PQQ.) = P (| 11 1 “ bs 














s=1 i=j,-)+ 
(56) t-—1 js+1 - kb ” 
)I ff #.| If #.), Ge=o @=1,2,---,p. 
s=1 i=j,+ i=jitl 
Let ji < jo < --- < Jj, and for each value of s, (s = 1, --- , p), consider a 


selection of j, + v, events of second subscript s from (51). 


Let the p selections 
thus obtained be such that 


jet ve S jour, (8 = 1,2,---, p), Jrui = 4), 


and if E;, is one of the events of the selection of events of second subscript s 
then the fact that t > s implies that E,, is one of the events of the selection of 
events of second subscript t. 

From (52) and (53), the probability of the occurrence of all the events of the 
p selections thus obtained is a function of 7, + v, events, us of which are of 












second subscript s, (s = 1, --- , p) where 
(57) My + we +--+ tus = Js +s; (s=1,---,p), 
and for a given set of values of j; , je, «+--+ ,jp the uw, and v, determine one another 


uniquely, (s = 1, --- , p). 
For a definite set of values of ji, - 
v1, +++ ,v, there will be 


-+ jp and wy, +--+, ep OF fi, +++, Jp and 






(jext — jsp ¥s) = (Jest — Jes Jott — Ma — +++ — ws), «=(8S=1,---,p), (Jp =F) 
possible distinct selections of j, + »., (s = 1,---,p) events of second sub- 
script s, j, of which are preassigned, from j.41 events, (s = 1, --- , p). 

Let these selections be arranged in some order for each value of s,s = 1,--- , p. 
and let 
(58) Wijig ++: ip (Ma »H2y +++ 5 Mp) 


be the event which occurs when for all values of s, (s = 1, --- , p), the events 
- tk . i ‘ 13 
of the 7,°" selection of j; + v. events of second subscript s all occur. 





13 It is understood that the j, preassigned events of second subscript s are among the j: 
preassigned events of second subscript ¢, ({ > s) in the events (58). 
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A typical event (58) is 
p Jat¥s 
(59) Qi..-1(u1, * ++) Mp) = II II E..; (jo + v0 
s=1 t=) g—1 tv s—1t+ 


There will be, for a definite 7, events of second subscript s, (s = 1, --- , p) 


(60) IT Gis — jus 02), (joa =k), 


events such as (58). 
For a definite set of values of uw, --- , u, there will be, for each value of s, 
(s = 1,---,p) 


(kK — pet — +++ — M13 Ms), 


possible distinct selections of j, + v, events of second subscript s, jsa + v4 
of which are preassigned from k events, (s = 1, --- , p). 
Let these selections be arranged in some order for each value of s, 


(s = i. eee m, 
and let 


(61) Git cos ip (HA . Me , eee, Mp) 


be the event which occurs if and only if, for all values of s the events of the 
i," set of j. + v. events of second subscript s all occur, (s = 1, ---, p), and 
the first subscripts of the events of the 7,‘" set of events of second subscript s 
are among the first subscripts of the events of all the selections of events of 
second subscript greater than s, (s = 1, --- , p). 

There will be 


(62) (kK; mi, Moy +s 


events (61) which may thus be obtained. 

Theorem VII. The probability that of the pK events (51) the first j, events of 
second subscript s occur and the remaining k — j, events do not occur, s = 1, +--+, p, 
is 


4-1: Ji 


P(Q,Q,) _ zz. = ig ie (arr 


v,=0 vo=0 vp=0 


(63) 


(Ja=J1iv1) (ig =Z2+¥2) (k—=ipirp) 


>» Pldi,ig---s,(s1) M2, ++ Mp], 
i)=1 ip=1 ip=0 
where the event Q, determines the j,; — j,1 — vs—1 events of second subscript 
s, (s = 1, --- , p), which have as first subscripts all numbers 1, 2, --- , j, which 
are not among the j,_; + »,_; numbers determined by the events of lower second 
subscript than s which are contained in qi, ... i, (ui, +++ , Mp). 
Proof. Expand (56) by means of Theorem IV. 
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Corotuary. If, for each fixed set of values of wu: , we, --- , wp the terms (58), 
in number (60), are all equal, then 


fei Jo-Je 


64) P(Q,Q,) = z a = (— eo —_? I (je41 ~ $5 Vs) 
Plai...1(ur, Ha, °*%%, p)] (Jp41 =k). 
Let 


(kiwi) (k—u1iiue) (k—p1—* * *—up—15Hp) 


re = wei 
(65) (i - ) i,=1 ig=1 ip=1 
PlGisig---é,(ur, pep *°* 5 Hp) ]. 
If all the terms of (65) are equal, then 


(66) T(u1, Vee » Mp) — (kK; wr, me, ++) Mp) Plqr...a(un, Pes. Mp)]. 


Theorem VIII. The probability that of the pK events (51) exactly j, events of 
second subscript s,s = 1, --- , p occur, is 


i3— 7 j2 io 
v +v v 
Pi, --+i) = we (-—1)""" P 


2X Xu Vp=0 


(67) 2 
I (ues je — Mi — +++ — Met) T(r, me, +++ 5 Mp) 


Proof. If Ag,, ...,4,) is the subset of Q determined by the requirement 
that exactly j, of the events (51) occur (s = 1, --- , p), then Ag;,, ..., ;,) is the 
sum of 


(k3 ji, Je oo jis Js — je, ee Ip — jp) 
disjunct sets which may be obtained by replacing P by A in (56) and forming 
(56) for all selections of 7, — js. occurring events from k — j,_; events, 
(s = 1,---,p). By Axiom V, Pi;,, ..., ;,) is the sum of the probabilities of 
these disjunct sets. 


Substituting from (63), it is noted that all terms (61) which depend on the 


same p;,, (s = 1, --- , p), have the same sign and that all T(u1, we, --- , up) 
for which 


0 < vs. < Jeti — js; (s = 1,---, p), 


appear and only those appear. Furthermore any particular tcrm (61) will 
occur in those of the terms (63) the j, — j,-1 occurring events of second sub- 
script s, (s = 1,---,p), of which contain a fixed »,, events, the remaining 
js — js. — Vs. events being a subset of the uw, events of second subscript s, 
(s = 1, ---,p), that actually appear in the particular term (63). Hence the 
coefficient of T(u,--- , up) is 


p 
(—1)""*---*% I (nes Je —mo-eee Ms—1), (20 = 0). 
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Coroutuary. If (66) is true for all sets of possible values of wu, ue, 
then 


Ja=—33 S420 k-j 
Fels = 7 2. oe a4 (— gy 


v,=0 v,=0 Vp=0 


* > Mp 


(68) 


(ksji, ViyJ2 = nn = Pip easy *** Je a Jr-1 — Vp-1) Vp) 
Plqi,...(ur; M2, °° - 5 Mp). 


Theorem IX. The probability that of the pk events (51) at least j, , but not more 
than jyi1 , events of second subscript s occur, (s = 1, --- ,g), and exactly j, events 
of second subscript s occur, (s = g + 1, --- , p) is 


1 1 1 
(69) re - i d+ 2, Rigs: jp) (1,02, er » 9,), 


0 63=0 
° ° ‘th e,° ° ° 
where, if a 1 in the 2’ position is denoted by 6;, (¢ = 2, --- , g), 
Rejoas.---:4g), bry ++ » 9x15 0, : , 9, Byo4ty 6+ +9 Oy3, 9, +++, 0, ++, Oyy4a, ++, dy) 
k—ip Jg+2—Ig+1 dg ti-jg dygti—Jy3—1 dyg—J 3-1-1 e-31—1 
<3 .° SS SES ES... Een 
ws i aia — Yys=dyg—)7g Ys 10 a 








(70) 





(ji +m — Ij”) - 
(Jag Png — Fay — Png — U5 Py) «+> Go + Op — Gna — P13 Op 
Ti tray ++ sdaa + 2a — drat — Prats, +++, 0, 
Fru + Pra — Svs — Pray 0s dp + Mp — Jr — Ypt)- 
Proof. We note first that there are 2’ terms in (69). Since 


“ (jy, +, = Just — Vy3—-1 — Ase,,) 


Jgti 2 
icin, = SFE ) 
(71) ri. rl» a.) r QA: Ng ig+1° **JIp)) 
Ag=ig Ag=j2 M=/1 


the theorem may be proved by a process of repeated summation. From (67) 
and (71) 


y1) 
Po: 


A2 Aa > ny 
(Ar + v5 v1)(A2 + v2 — Aa — ¥15 2) +++ Jp + Yep — Jp — Ppa; YD 
+i m,-en~de mee 
For fixed values of Xz, A3, «+: , A, there will occur in (72) all terms 
(73) T(ji + Bi, 2 + v2 — jr — Bry +++ Jo + ¥p — Joa — Yp-), 
(8, =0,---,»—h), OSS —d), (8 =2,---,p), 


(72) 


(ore = Joos 8 = 1,---, Dp — 9); 
and any definite term (73) will occur in all 


(74) 


Py; 
Ci +@,A2,++ +s Jp) 
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for which 
0 < a < Bi . 
In (74), the definite term (73) will have coefficient 
(—1)PePeF CG, Ba 5 ja + x) (Ae + v2 — jr — Bi 5 2) 
(75) ++ (jp + vp — joa — M1, %), (a2 = 0,1,---, Ar), 
(8, = 0, --- , re — ji). 

Hence, in (72) the definite term (73), will have coefficient 
(—1)*8* (5, + By — 15 Bi)(Xe + v2 — ju — Br 5 2) 

++ Gp + ep — Joa — Yp-1 3 Yr); 
and 
(76) PY sip) = Rag: --ip(1). 
We now evaluate 
7) Pi, = DS Pay: 

Ao=72 

For any fixed values of \3, --- , A, , there will occur in (77) all terms 


T(ji + Bi, jo + Be — fi — Bi, As + 3 — jo — fr, 
-yJo + Vp — Jr = Yp-1); 


(78) 


for which either 0 < B <dA3 —fo3OS Bi Sfp—fp—lorhi=jf—-f~Arty, 
0O<y7 < As — p2e30 < Be Ss — fo — ¥. 

Let 0 < Bi < jo — ji — 130 < Bo < As — je. Then the term (78) will occur 
in all 


(79) iiss 
such that 
0O<a< ke. 
In (79), (78) will have coefficient 
(— 1) tPeratret 9G, + By — 1; Bi)(je + Be — jr — Bi — 1; Be — a) 
(As + vs — jo — Bo 3 vs) «++ (Jp + ¥p — Joa — Mp1; ¥p)- 
Hence in (77), (78) will have coefficient 
(1) *hetreF FG, +B, — 1; Bi) (je + Bs — jr — Bi — 1; Be) 
(As + vs — jo — Bo; v3) +++ (Jp + ¥p — Jr — Yr j Yn); 
(6: = 0,--- ,j2 — jr — 1), (Bo = 0, --- , As — je), 
(v, = 0, --+ , Ast — As), (s = 3,---, 1); 
(Nore = jots), (8 = 1,---,p — g). 


(80) 
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Now let &: = fp —-fity;0S 7 SA3s — jo30 < Be S As — jo — y. Then the 
term (78) will occur in all terms (79) such that 


Y¥Sacskh, 


and in (79), (78) will have coefficient (80). Summing for a, (a = y, --- , &), 
we obtain as the coefficient of (78) in (77) 


0, if Bo > 7) 
and 


(—1)P 47895, + By — 1; Bi)(As + os — ji — Bi 3 ¥) 
ee (jp + vp — Jr-1 — Vp-1 5 Vp), if PB. = 7; 


Hence 
(82) PO?) = Rag---in(1, 1) + Rag---i(1, 0). 


If we examine (82), we note that the result of summing with respect to ), 
has been the replacement of (76) by two sums which are similar to (76) in that 
the next summation index, in this case \3 , occurs in exactly two limits of sum- 
mation. If it can be shown that the two sums which occur in (82) each result 
in a pair of sums after summation with respect to 3 , or more exactly if 


As+2 


Rowss.---dp (1, O2, +++, 6.) 


(83) saa ato 


—_ Rass ahh 02, ae ee 6, 1) a Re agthls 2, cee 4, 0) 


then the proof will be completed. 

Since the truth of (83) may be demonstrated in exactly the same way in 
which (82) has been shown to be true, the theorem is proved. 

Corouiary. If (66) is true for all sets of possible values of w1, ue, --- 
then 


Reseas.---:tg9 ly br, > * > By, 59, -++ 50, 5,041, -++,6,,,0, -++0,-- + Oyntty ee 
k—ip Jg+2—Jgt+1 Jgti—ig JygtimJy3-1 jsi1 1 
at tw wn as 
¥p=0 ¥g+1=0 ¥g=0 Vys=lys—I 73 v,=0 
. XV . . 
(ju omy — 15 91) +++ Gag + vg — Frat — Pn — 1; Vrs) 
(jr4 + Vyg — Jvs —~ Yrs — 1; Vs) ow (jp + Vp — Jp—1 — Mp1 Vp) 
(k3 ji + Viy*** yJrys3 + Vy3 — Jys—l — Vys—ly Irs 
+ up ~ Pea es ** cde FO Khe Vp-1) 
Plq.. lj TH M1, °°* sIvs 55 ~~ Jog-t — Prats 0,---,9, 
Peg eg Jen ns *** oe Pe Wd Vp]. 
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Let us again consider the experiments (19) and let E“ have as possible results 
Om (j = 1,---,k), (8 = 1,2) @ = 1,2,---,7). 

Let 

G =1,--- 


i.e. Of? occurs whenever O$: occurs. Furthermore let the outcomes 


(7) (7) 
0}; > 0}, 


G) Ai) >) 
Oi; , Ooi’, «++ , Ox, 
be mutually exclusive. 
Let 


Oj ’ 
occur if and only if none of 
oY of. ... of 
78) 28) ’ 78 
occur. _ ; 
We may wish to know the probability that at least j, of Ou, --- , Om and 


at least D ) Je => n ) of O12 ’ Ors geese Oj2 occur. 
From Theorem IX this probability is equal to 


(85) Pp’ = R(1, 1) + R(1, 0), 


where 
jo —~ 


mao. SF 2. HG a ~ td 


vo=0 v,=0 


(je +» n a 1; ve) T (ji + V1, Je +n yn o- v1); 


RQ,0)= Ss (-)"Gi ti — Lyd Tia + vv. 


¥1™32—31 


From (63) 
(kyiiter) (k-71—-¥1372+%2-71-71) 
(86) T(ji t+ 1, j2 + v2 —ji— mn) = 2 2 
i1= io= 
P [Gi,.(j1 + 13 je + v2 — ji — yd); 
where, from (61) 
jitvyy “si 
Gisig(Gia + 1, jo $e —ji— m1) = II Oa,1 il a Oa,25 
v= y=ji+r1 


the subscripts 


(87) 1, AZ, *** » Ajy+y) 
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being the first subscripts of the 7," selection of j; + »; events of second sub- 
script 1 from 


On , On, +--+, On, 


and the subscripts 
Ajytvitl » Hiy+vj4+2» °° * » Ajgtve » 


being the first subscripts of the 72""selection of j2 + v2 events of second subscript 2, 
ji + 1 Of which are (87), from 


Or. , O2 , --- , On. 
It is easy to see that 


r jit, f J2+ve2 . 
Pldiyin(ji + 15 fp +2 — jr — rd) = II {1 ~ dX POS) - Dv Poss} 


i=1 y=jytritl 


Furthermore 


(k3 j1+71) 


(88) Tht = Dd Pla(j. + wl, 


a,;=1 


where 


r jit s 
P(g + v1)] = et {2 a a Posy} 
i= p= 
Substituting from (86) and (88) into (85) the desired probability is obtained. 
It may be remarked that theorems which have the same relation to Theorems 
VII, VIII, and IX that Theorems IV, V, and VI have to Theorems I, II, and 
III may be obtained without much difficuity. 
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REPLY TO MR. WERTHEIMER’S PAPER 
RicumMonp T. Zocu 


The attainment of rigor both in applied as well as pure mathematics is a slow 
process, and for this reason criticism of my paper, if constructive, is welcomed. 

Properties like continuity, differentiability, and dimensionality are local 
properties, that is to say a function may be continuous or differentiable over a 
certain range but not outside this range, or otherwise a function may be con- 
tinuous or differentiable over a given range except for singular points. 

The presence of singularities in functions does not necessarily cancel their 
utility. Thus the function y = tan x contains points where it is discontinuous, 
but ordinarily it is regarded as a continuous function and the presence of these 
singular points seldom handicaps one when working with this function. Simi- 
larly, the function f =  — 3 = is a function which satisfies all four Axioms as 

2 
stated in Whittaker and Robinson’s book and expresses the mode of Pearson’s 
Type III curve as a symmetric function of the measures. The fact that this 
function is not differentiable along the line 2; = zz = x3 = --- = 2, will never 
handicap the investigator for unless the frequency distribution is clearly skew 
the Type III curve would not be used to represent it. 

It seems that Mr. Wertheimer bases nearly all his criticisms on the tacit 
addition of the word “everywhere” to Axiom IV as stated in Whittaker and 
Robinson’s book. The word ‘‘everywhere”’ is not in the statement of Axiom 
IV and I assumed nothing else than stated in the axiom. 

If one deliberately adds the word ‘‘everywhere’”’ to Axiom IV then nearly all 
my criticisms of previous writers are incorrect, unfair, and unjust. However, 
it does not seem that clearness and rigor in mathematics are increased by read- 
ing into an axiom a word that is not there. 

Consider first the criticism in my paper which remains valid even when the 
word ‘‘everywhere”’ is added. (Schimmack uses the word “everywhere” on 
page 127 although Whittaker and Robinson do not.) Both Schimmack and 
Whittaker and Robinson proceed as at the top of page 217 of the book by the 
latter authors with the statement: “In this equation make k — 0 then each 
of the quantities F4 tends to a value which is independent of the 2’s --- .” 


n 


This statement rests on the tacit assumption that the quantities | are func- 


n 


tions of k. Even if such were true the use of tacit assumptions in a rigorous 

proof is objectionable, but as a matter of fact these quantities are not functions 

of k. Thus the particular proof given in Whittaker and Robinson’s book as 
177 
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well as in Schimmack’s paper is altogether lacking in rigor even when the word 
“everywhere” is added to Axiom IV. Both Schiaparelli’s and Broggi’s proofs 
appear to be entirely rigorous if the word ‘‘everywhere’”’ is added to Axiom IV, 

In preparing my paper I assumed that no prohibition on functions which had 
singular points was contained in Axiom IV. In other words, I assumed since 
the word ‘‘everywhere”’ did not appear there was no valid objection to intro- 
duce and discuss functions with singularities. The functions I introduced are 
everywhere continuous but they are not differentiable along the line in Euclidian 
n-space defined by 71 = r2 = 1% = --- = 2y. They are differentiable at every 
other point in the space. 

It seems to me since Axiom IV as stated in Whittaker and Robinson’s book 
does not exclude functions which are not everywhere differentiable that all my 
criticism is fair and just, and moreover nearly all my statements are correct. 
Mr. Wertheimer is entirely correct in pointing out that the words “everywhere” 
on page 181 of my paper are contradictory. As a matter of fact the whole 
paragraph beginning with line 7 on page 181 appears to me, on reexamining it, 
to be unsatisfactory. Except for this single paragraph I believe my paper to 
be rigorous, but I welcome further criticism. 

Mr. Wertheimer’s conclusions in his paragraph number 4 are clearly errone- 
ous. To show this, consider a function of k. As k — 0 any one of three situa- 
tions may arise, namely: (1) The function may become infinite, (2) the func- 
tion may become indeterminate, that is it may take on any value whatever, 
(3) the function may approach a unique finite value independent of k. Neither 
Schimmack nor Whittaker and Robinson nor Mr. Wertheimer has established 
as a definite fact that the particular type of function here in question approaches 
a unique finite value independent of kask—>0. The truth of the matter is that 
this conclusion cannot be established because the function in question does not 
involve k either explicitly or implicitly. 

In conclusion there are two things I wish to emphasize. First, even when 
the word “everywhere” is added to Axiom IV, the proof given in Whittaker 
and Robinson’s book is faulty, but if one consults the references given there 
in the footnotes he will find two other proofs which are rigorous with this ad- 
dition to Axiom IV. Second, the mode of a skew bell shaped Pearson Fre- 
quency Curve satisfies all four axioms as stated in Whittaker and Robinson’s 
book, and the fact that these expressions for the mode are not differentiable 
along a certain line is never a’ handicap to the statistician. 
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